An Approach to Proper Name Tagging for German
نویسنده
چکیده
This paper presents an incremental method for the tagging of proper names in German newspaper texts. The tagging is performed by the analysis of the syntactic and textual contexts of proper names together with a morphological analysis. The proper names selected by this process supply new contexts which can be used for finding new proper names, and so on. This procedure was applied to a small German corpus (50,000 words) and correctly disambiguated 65% of the capitalized words, which should improve when it is applied to a very large corpus.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملروشی جدید جهت استخراج موجودیتهای اسمی در عربی کلاسیک
In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...
متن کاملTagging Unknown Proper Names Using Decision Trees
This paper describes a supervised learning method to automatically select from a set of noun phrases, embedding proper names of different semantic classes, their most distinctive features. The result of the learning process is a decision tree which classifies an unknown proper name on the basis of its context of occurrence. This classifier is used to estimate the probability distribution of an ...
متن کاملMorphological features help POS tagging of unknown words across language varieties
Part-of-speech tagging, like any supervised statistical NLP task, is more difficult when test sets are very different from training sets, for example when tagging across genres or language varieties. We examined the problem of POS tagging of different varieties of Mandarin Chinese (PRC-Mainland, PRCHong Kong, and Taiwan). An analytic study first showed that unknown words were a major source of ...
متن کاملAn unusual necrotic myositis by Clostridium perfringens in a German Shepherd dog: A clinical report, bacteriological and molecular identification
Clostridial myositis, considered to be rare in pet animals, is an acutely fatal toxaemic condition. Some species of clostridia are responsible for necrotic myositis. A 2-year-old male German shepherd dog was admitted with non-weight bearing lameness and massive swelling of the left hind limb. Clostridium perfringens type A with alpha toxin was diagnosed as a pathogenic agent. Based on ...
متن کامل